Predictive Maintenance for Centrifugal Pumps¶

Project Description¶

This project focuses on developing a Predictive Maintenance System for Centrifugal Pumps used in chemical industries. By leveraging machine learning algorithms and sensor data, the goal is to predict potential failures before they occur, optimizing maintenance schedules, minimizing downtime, and reducing operational costs.

Dataset Parameters¶

The dataset simulates operational data collected from centrifugal pumps. Key parameters include:

  • Air Temperature [K]: Ambient temperature near the equipment.
  • Process Temperature [K]: Temperature of the fluid being pumped.
  • Rotational Speed [rpm]: Speed of the pump impeller.
  • Torque [Nm]: Motor torque applied to drive the pump.
  • Tool Wear [min]: Cumulative wear of critical components like bearings and impellers.
  • Target: Indicator of pump failure.
  • Failure Type: Categorized into: No Failure Power Failure Tool Wear Failure Overstrain Failure Random Failures Heat Dissipation Failure

What Are Centrifugal Pumps?¶

Centrifugal pumps are mechanical devices designed to move fluids by converting rotational kinetic energy from a motor into hydrodynamic energy. They operate based on centrifugal force, where the rotation of an impeller increases the fluid's velocity and pressure.

Uses in Chemical Industries¶

  • Fluid Transfer: Transporting chemicals, solvents, and process liquids across different units.
  • Reaction Processes: Circulating reactants in chemical reactors.
  • Cooling Systems: Pumping cooling water in heat exchangers.
  • Filtration Systems: Driving fluids through filtration units. Centrifugal pumps are indispensable in chemical manufacturing, ensuring smooth and efficient operations.

Predictive Maintenance¶

Predictive maintenance is a proactive strategy that uses data analysis tools and techniques to identify potential equipment failures before they occur. Unlike reactive or preventive maintenance, it optimizes maintenance schedules by predicting the actual condition of equipment.

How It Works:¶

  • Data Collection: Sensors monitor critical parameters like speed, temperature, and torque.
  • Data Analysis: Historical data is analyzed to identify patterns and anomalies.
  • Machine Learning Models: Algorithms predict the likelihood of failures based on sensor data.
  • Actionable Insights: Maintenance teams are alerted to repair or replace components proactively.

Why Is It Crucial and Beneficial?¶

  • Reduces Downtime: Minimizes unexpected breakdowns.
  • Cost-Effective: Prevents over-maintenance and reduces repair costs.
  • Improves Safety: Avoids catastrophic failures that could endanger workers or the environment.
  • Enhances Efficiency: Ensures optimal equipment performance.

Current Industry Practices¶

Industries are increasingly adopting machine learning for predictive maintenance. Tools like anomaly detection, time-series forecasting, and classification models are integrated with IoT-enabled systems to monitor and maintain equipment health.

  • Case Studies: Companies like GE and Siemens deploy AI-driven predictive maintenance solutions for pumps and compressors.

  • Real-Time Monitoring: Systems continuously monitor sensor data and predict failures using cloud-based platforms.

  • Scalable Solutions: Machine learning models adapt to different equipment and environments.\

    Approach for the problem

    so it's a classification problem we are here predicting whether a centrifugal pump is going to failure or not in certain condtions and parameters as we have to predict failure or no failure along with which type of failure it is we will develop two step model

    first one will predict failure or no failure then second to determine what type of failure it is

In [1]:
import numpy as np 
import pandas as pd
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")


sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 8)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
In [2]:
df = pd.read_csv('predictive_maintenance.csv')
df
Out[2]:
UDI Product ID Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target Failure Type
0 1 M14860 M 298.1 308.6 1551 42.8 0 0 No Failure
1 2 L47181 L 298.2 308.7 1408 46.3 3 0 No Failure
2 3 L47182 L 298.1 308.5 1498 49.4 5 0 No Failure
3 4 L47183 L 298.2 308.6 1433 39.5 7 0 No Failure
4 5 L47184 L 298.2 308.7 1408 40.0 9 0 No Failure
... ... ... ... ... ... ... ... ... ... ...
9995 9996 M24855 M 298.8 308.4 1604 29.5 14 0 No Failure
9996 9997 H39410 H 298.9 308.4 1632 31.8 17 0 No Failure
9997 9998 M24857 M 299.0 308.6 1645 33.4 22 0 No Failure
9998 9999 H39412 H 299.0 308.7 1408 48.5 25 0 No Failure
9999 10000 M24859 M 299.0 308.7 1500 40.2 30 0 No Failure

10000 rows × 10 columns

In [3]:
df_copy = df.copy()
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   UDI                      10000 non-null  int64  
 1   Product ID               10000 non-null  object 
 2   Type                     10000 non-null  object 
 3   Air temperature [K]      10000 non-null  float64
 4   Process temperature [K]  10000 non-null  float64
 5   Rotational speed [rpm]   10000 non-null  int64  
 6   Torque [Nm]              10000 non-null  float64
 7   Tool wear [min]          10000 non-null  int64  
 8   Target                   10000 non-null  int64  
 9   Failure Type             10000 non-null  object 
dtypes: float64(3), int64(4), object(3)
memory usage: 781.4+ KB
  1. We don't have null values
  2. Three main data types
In [5]:
df.describe()
Out[5]:
UDI Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target
count 10000.00000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000 10000.000000
mean 5000.50000 300.004930 310.005560 1538.776100 39.986910 107.951000 0.033900
std 2886.89568 2.000259 1.483734 179.284096 9.968934 63.654147 0.180981
min 1.00000 295.300000 305.700000 1168.000000 3.800000 0.000000 0.000000
25% 2500.75000 298.300000 308.800000 1423.000000 33.200000 53.000000 0.000000
50% 5000.50000 300.100000 310.100000 1503.000000 40.100000 108.000000 0.000000
75% 7500.25000 301.500000 311.100000 1612.000000 46.800000 162.000000 0.000000
max 10000.00000 304.500000 313.800000 2886.000000 76.600000 253.000000 1.000000
  1. The data looks good with resonable max and min values

Exploratory Analysis and Visualization¶

In [6]:
fig = px.histogram(df,x='Air temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]')
fig.update_layout(bargap=0.2)
fig.show()
In [7]:
fig = px.histogram(df,x='Process temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]',color_discrete_sequence=['green'])
fig.update_layout(bargap=0.2)
fig.show()

From above Plotings we can observe the max temp[K] noted by senrors

In [8]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Rotational speed [rpm]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='RPM VS NM', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1200, height=700)

fig.show()
In [9]:
df['Target'].unique()
Out[9]:
array(['0', '1'], dtype=object)
In [10]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Process temperature [K]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Process temperature [K] VS Torque', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1000, height=700)

fig.show()
In [11]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Air temperature [K]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Air temperature [K] VS Torque', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1000, height=700)

fig.show()
In [12]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)

# Convert 'Target' to categorical (optional but recommended)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Rotational speed [rpm]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Rotational speed [rpm] VS Torque', 
                 color='Failure Type',
                 )

# Set marker size
fig.update_traces(marker=dict(size=2))

# Set figure size
fig.update_layout(width=1000, height=700)

fig.show()

FEATURE ENGINEERING¶

In [13]:
df
Out[13]:
UDI Product ID Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target Failure Type
0 1 M14860 M 298.1 308.6 1551 42.8 0 0 No Failure
1 2 L47181 L 298.2 308.7 1408 46.3 3 0 No Failure
2 3 L47182 L 298.1 308.5 1498 49.4 5 0 No Failure
3 4 L47183 L 298.2 308.6 1433 39.5 7 0 No Failure
4 5 L47184 L 298.2 308.7 1408 40.0 9 0 No Failure
... ... ... ... ... ... ... ... ... ... ...
9995 9996 M24855 M 298.8 308.4 1604 29.5 14 0 No Failure
9996 9997 H39410 H 298.9 308.4 1632 31.8 17 0 No Failure
9997 9998 M24857 M 299.0 308.6 1645 33.4 22 0 No Failure
9998 9999 H39412 H 299.0 308.7 1408 48.5 25 0 No Failure
9999 10000 M24859 M 299.0 308.7 1500 40.2 30 0 No Failure

10000 rows × 10 columns

As we are going to train the first model for target column we will drop failure type column

In [14]:
df['rolling_mean_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).mean()
df['rolling_std_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).std()
df['rolling_var_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).var()
In [15]:
df
Out[15]:
UDI Product ID Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target Failure Type rolling_mean_rpm rolling_std_rpm rolling_var_rpm
0 1 M14860 M 298.1 308.6 1551 42.8 0 0 No Failure 1551.000000 NaN NaN
1 2 L47181 L 298.2 308.7 1408 46.3 3 0 No Failure 1479.500000 101.116270 10224.500000
2 3 L47182 L 298.1 308.5 1498 49.4 5 0 No Failure 1485.666667 72.293384 5226.333333
3 4 L47183 L 298.2 308.6 1433 39.5 7 0 No Failure 1472.500000 64.634872 4177.666667
4 5 L47184 L 298.2 308.7 1408 40.0 9 0 No Failure 1459.600000 62.970628 3965.300000
... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 9996 M24855 M 298.8 308.4 1604 29.5 14 0 No Failure 1583.200000 131.944264 17409.288889
9996 9997 H39410 H 298.9 308.4 1632 31.8 17 0 No Failure 1595.700000 129.827278 16855.122222
9997 9998 M24857 M 299.0 308.6 1645 33.4 22 0 No Failure 1610.200000 125.991887 15873.955556
9998 9999 H39412 H 299.0 308.7 1408 48.5 25 0 No Failure 1573.900000 126.805582 16079.655556
9999 10000 M24859 M 299.0 308.7 1500 40.2 30 0 No Failure 1566.200000 128.916683 16619.511111

10000 rows × 13 columns

In [16]:
df['torque_trend'] = df['Torque [Nm]'].diff()
df['temperature_trend'] = df['Process temperature [K]'].diff()
In [17]:
df
Out[17]:
UDI Product ID Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target Failure Type rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend
0 1 M14860 M 298.1 308.6 1551 42.8 0 0 No Failure 1551.000000 NaN NaN NaN NaN
1 2 L47181 L 298.2 308.7 1408 46.3 3 0 No Failure 1479.500000 101.116270 10224.500000 3.5 0.1
2 3 L47182 L 298.1 308.5 1498 49.4 5 0 No Failure 1485.666667 72.293384 5226.333333 3.1 -0.2
3 4 L47183 L 298.2 308.6 1433 39.5 7 0 No Failure 1472.500000 64.634872 4177.666667 -9.9 0.1
4 5 L47184 L 298.2 308.7 1408 40.0 9 0 No Failure 1459.600000 62.970628 3965.300000 0.5 0.1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 9996 M24855 M 298.8 308.4 1604 29.5 14 0 No Failure 1583.200000 131.944264 17409.288889 1.6 0.1
9996 9997 H39410 H 298.9 308.4 1632 31.8 17 0 No Failure 1595.700000 129.827278 16855.122222 2.3 0.0
9997 9998 M24857 M 299.0 308.6 1645 33.4 22 0 No Failure 1610.200000 125.991887 15873.955556 1.6 0.2
9998 9999 H39412 H 299.0 308.7 1408 48.5 25 0 No Failure 1573.900000 126.805582 16079.655556 15.1 0.1
9999 10000 M24859 M 299.0 308.7 1500 40.2 30 0 No Failure 1566.200000 128.916683 16619.511111 -8.3 0.0

10000 rows × 15 columns

In [18]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Failure Type Encoded'] = le.fit_transform(df['Failure Type'])
dict(zip(le.classes_, le.transform(le.classes_)))
Out[18]:
{'Heat Dissipation Failure': np.int64(0),
 'No Failure': np.int64(1),
 'Overstrain Failure': np.int64(2),
 'Power Failure': np.int64(3),
 'Random Failures': np.int64(4),
 'Tool Wear Failure': np.int64(5)}
In [19]:
df = df.drop('Failure Type', axis=1)
In [20]:
df
Out[20]:
UDI Product ID Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded
0 1 M14860 M 298.1 308.6 1551 42.8 0 0 1551.000000 NaN NaN NaN NaN 1
1 2 L47181 L 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1
2 3 L47182 L 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1
3 4 L47183 L 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1
4 5 L47184 L 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 9996 M24855 M 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 1
9996 9997 H39410 H 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 1
9997 9998 M24857 M 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 1
9998 9999 H39412 H 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 1
9999 10000 M24859 M 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 1

10000 rows × 15 columns

In [21]:
df = df.drop(columns=['UDI', 'Product ID'])
In [22]:
df
Out[22]:
Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded
0 M 298.1 308.6 1551 42.8 0 0 1551.000000 NaN NaN NaN NaN 1
1 L 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1
2 L 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1
3 L 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1
4 L 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 M 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 1
9996 H 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 1
9997 M 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 1
9998 H 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 1
9999 M 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 1

10000 rows × 13 columns

In [23]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Quality Type Encoded'] = le.fit_transform(df['Type'])
In [24]:
df_1 = df.copy()
In [25]:
df
Out[25]:
Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded Quality Type Encoded
0 M 298.1 308.6 1551 42.8 0 0 1551.000000 NaN NaN NaN NaN 1 2
1 L 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1 1
2 L 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1 1
3 L 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1 1
4 L 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 M 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 1 2
9996 H 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 1 0
9997 M 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 1 2
9998 H 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 1 0
9999 M 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 1 2

10000 rows × 14 columns

In [26]:
df = df.drop('Type', axis =1)
In [27]:
df
Out[27]:
Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded Quality Type Encoded
0 298.1 308.6 1551 42.8 0 0 1551.000000 NaN NaN NaN NaN 1 2
1 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1 1
2 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1 1
3 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1 1
4 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 1 2
9996 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 1 0
9997 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 1 2
9998 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 1 0
9999 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 1 2

10000 rows × 13 columns

In [28]:
df = df.drop(index=0).reset_index(drop=True)
In [29]:
df = df.drop('Failure Type Encoded', axis =1)
In [30]:
df
Out[30]:
Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Quality Type Encoded
0 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1
1 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1
2 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1
3 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1
4 298.1 308.6 1425 41.9 11 0 1453.833333 58.066915 3371.766667 1.9 -0.1 2
... ... ... ... ... ... ... ... ... ... ... ... ...
9994 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 2
9995 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 0
9996 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 2
9997 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 0
9998 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 2

9999 rows × 12 columns

In [31]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=['Target'])
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)

from xgboost import XGBClassifier


y_train = y_train.astype(int)
y_test = y_test.astype(int)


model = XGBClassifier(n_estimators=100, learning_rate=0.05)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


y_pred = model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[1930    6]
 [  27   37]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      1936
           1       0.86      0.58      0.69        64

    accuracy                           0.98      2000
   macro avg       0.92      0.79      0.84      2000
weighted avg       0.98      0.98      0.98      2000


Accuracy Score: 0.9835

Here we are getting good score so lets give some input data to how it will perform on unseen data

In [32]:
import numpy as np

custom_data = np.array([[298.2, 202.3, 3200,35.0,2,1300.500000, 300,5222,-0.1,1,1]])  


custom_pred = model.predict(custom_data)

print("Predicted Failure Type:", custom_pred[0])
Predicted Failure Type: 0

Lets see another model to do the same

In [33]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


X = df.drop(columns=['Target'])
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)


y_train = y_train.astype(int)
y_test = y_test.astype(int)


model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


y_pred = model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[2416    3]
 [  42   39]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      2419
           1       0.93      0.48      0.63        81

    accuracy                           0.98      2500
   macro avg       0.96      0.74      0.81      2500
weighted avg       0.98      0.98      0.98      2500


Accuracy Score: 0.982

As we can we see successfully trained two machine learning models with scores [ 0.9835, 0.982] and but our ultimate target is to find the failure type also so we will train one more model if first model predict failure then we will predict what type of failure actually it is¶

In [34]:
df_1
Out[34]:
Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded Quality Type Encoded
0 M 298.1 308.6 1551 42.8 0 0 1551.000000 NaN NaN NaN NaN 1 2
1 L 298.2 308.7 1408 46.3 3 0 1479.500000 101.116270 10224.500000 3.5 0.1 1 1
2 L 298.1 308.5 1498 49.4 5 0 1485.666667 72.293384 5226.333333 3.1 -0.2 1 1
3 L 298.2 308.6 1433 39.5 7 0 1472.500000 64.634872 4177.666667 -9.9 0.1 1 1
4 L 298.2 308.7 1408 40.0 9 0 1459.600000 62.970628 3965.300000 0.5 0.1 1 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9995 M 298.8 308.4 1604 29.5 14 0 1583.200000 131.944264 17409.288889 1.6 0.1 1 2
9996 H 298.9 308.4 1632 31.8 17 0 1595.700000 129.827278 16855.122222 2.3 0.0 1 0
9997 M 299.0 308.6 1645 33.4 22 0 1610.200000 125.991887 15873.955556 1.6 0.2 1 2
9998 H 299.0 308.7 1408 48.5 25 0 1573.900000 126.805582 16079.655556 15.1 0.1 1 0
9999 M 299.0 308.7 1500 40.2 30 0 1566.200000 128.916683 16619.511111 -8.3 0.0 1 2

10000 rows × 14 columns

In [35]:
df_1 = df_1.drop(index=0).reset_index(drop=True)
In [36]:
df_1['Failure Type Encoded'].unique()
Out[36]:
array([1, 3, 5, 2, 4, 0])
In [37]:
df_1.sample(30)
Out[37]:
Type Air temperature [K] Process temperature [K] Rotational speed [rpm] Torque [Nm] Tool wear [min] Target rolling_mean_rpm rolling_std_rpm rolling_var_rpm torque_trend temperature_trend Failure Type Encoded Quality Type Encoded
2146 L 299.4 308.9 1550 33.7 174 0 1516.1 115.291370 13292.100000 0.8 0.1 1 1
923 L 295.5 306.0 1800 27.6 208 0 1518.4 122.254925 14946.266667 -17.7 0.0 1 1
2825 M 300.3 309.4 1612 33.1 150 0 1480.0 126.455965 15991.111111 -0.4 0.0 1 2
9347 L 298.2 308.7 1436 49.1 26 0 1504.1 116.846956 13653.211111 6.5 0.0 1 1
7310 H 300.0 310.5 1558 36.7 137 0 1479.5 142.797642 20391.166667 3.2 0.0 1 0
2620 M 299.5 309.3 1506 41.5 77 0 1611.1 311.317077 96918.322222 5.8 0.1 1 2
7940 M 300.7 311.7 1499 38.9 3 0 1558.0 160.645918 25807.111111 -1.1 0.0 1 2
9612 L 299.0 310.2 1377 62.5 92 1 1501.5 88.390862 7812.944444 30.7 0.0 3 1
7527 M 300.1 311.3 1748 26.6 42 0 1557.2 208.548316 43492.400000 -18.6 0.0 1 2
1235 H 297.1 308.4 1359 46.6 176 0 1611.0 338.146582 114343.111111 15.2 0.0 1 0
2763 M 299.9 309.3 1389 52.8 0 0 1465.1 125.725848 15806.988889 5.3 0.1 1 2
4060 L 301.9 310.8 1906 21.7 62 0 1646.0 293.130612 85925.555556 -33.8 -0.1 1 1
4659 L 303.2 311.2 1439 43.9 30 0 1562.0 144.936768 21006.666667 -0.6 0.0 1 1
2611 L 299.4 309.1 2421 14.2 57 0 1620.8 330.595355 109293.288889 -7.9 0.0 1 1
7714 L 300.5 311.5 1302 49.6 80 0 1560.0 235.973162 55683.333333 6.6 0.0 1 1
4534 L 302.4 310.2 1503 36.2 166 0 1492.6 84.128473 7077.600000 8.2 0.0 1 1
4746 L 303.3 311.2 1763 27.3 27 0 1550.6 231.892791 53774.266667 -18.3 0.0 1 1
8523 L 298.3 309.4 1468 46.2 10 0 1508.7 94.041421 8843.788889 4.5 0.1 1 1
7124 L 300.7 310.1 1261 56.6 98 0 1627.0 296.414500 87861.555556 32.1 -0.2 1 1
6137 M 300.8 310.7 1452 43.0 139 0 1567.1 181.970358 33113.211111 23.3 -0.1 1 2
2247 M 299.3 308.5 1304 60.6 8 0 1476.7 128.189140 16432.455556 12.1 0.1 1 2
8093 H 300.2 311.6 1662 28.8 162 0 1524.3 179.168481 32101.344444 -6.9 0.0 1 0
4375 M 301.9 309.6 1551 34.6 211 0 1512.2 123.878794 15345.955556 -1.7 -0.1 1 2
2973 M 300.6 309.4 1521 37.2 90 0 1489.6 147.501563 21756.711111 -8.8 0.1 1 2
4741 L 303.3 311.3 1592 33.7 14 0 1542.3 217.595981 47348.011111 -19.2 0.0 1 1
4803 L 303.7 312.6 1621 38.8 182 0 1465.2 127.659965 16297.066667 7.4 0.1 1 1
1697 L 297.9 307.6 1481 38.0 40 0 1572.2 98.111954 9625.955556 -6.0 0.0 1 1
8725 L 297.2 308.6 1562 33.0 84 0 1519.1 81.147944 6584.988889 -9.1 0.1 1 1
9067 H 297.1 308.2 1790 30.3 128 0 1692.8 252.602366 63807.955556 -26.3 0.0 1 0
5410 L 302.8 312.6 1462 39.4 22 0 1539.9 206.189476 42514.100000 -12.1 0.1 1 1
In [38]:
df_1 = df_1.drop('Type', axis =1)
In [39]:
df_1 = df_1.drop('Target', axis=1)
In [40]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


X = df_1.drop(columns=['Failure Type Encoded'])
y = df_1['Failure Type Encoded']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)


y_train = y_train.astype(int)
y_test = y_test.astype(int)


failure_type_encoded_model = RandomForestClassifier(n_estimators=100, random_state=42)
failure_type_encoded_model.fit(X_train, y_train)


y_pred = failure_type_encoded_model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))
Confusion Matrix:
[[  10   15    0    0    0    0]
 [   1 2415    0    0    0    0]
 [   0   13    7    1    0    0]
 [   0    5    0   16    0    0]
 [   0    4    0    0    0    0]
 [   0   12    1    0    0    0]]

Classification Report:
              precision    recall  f1-score   support

           0       0.91      0.40      0.56        25
           1       0.98      1.00      0.99      2416
           2       0.88      0.33      0.48        21
           3       0.94      0.76      0.84        21
           4       0.00      0.00      0.00         4
           5       0.00      0.00      0.00        13

    accuracy                           0.98      2500
   macro avg       0.62      0.42      0.48      2500
weighted avg       0.97      0.98      0.97      2500


Accuracy Score: 0.9792
In [41]:
import numpy as np


custom_data = np.array([[302.0,309.9, 38,57.6,197,1527.6,175.857392,30925.822222,30.4, 0.0,0]])

target_pred = model.predict(custom_data)

print("Predicted Target (Failure or Not):", target_pred[0])


if target_pred[0] == 1:
   
    failure_type_pred = failure_type_encoded_model.predict(custom_data)

    
    failure_types_encoded = {
        0: "Heat Dissipation Failure",
        1: "No Failure",
        2: "Overstrain Failure",
        3: "Power Failure",
        4: "Random Failures",
        5: "Tool Wear Failure",
    }

    
    print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
    print("No failure detected.")
Predicted Target (Failure or Not): 1
Predicted Failure Type: Heat Dissipation Failure
In [42]:
import numpy as np


custom_data = np.array([[300.3,309.9,1394,46.7,210,1492.4,72.9216,5317.600000,-5.4, 0.0,0]])

target_pred = model.predict(custom_data)

print("Predicted Target (Failure or Not):", target_pred[0])


if target_pred[0] == 1:
   
    failure_type_pred = failure_type_encoded_model.predict(custom_data)

    
    failure_types_encoded = {
        0: "Heat Dissipation Failure",
        1: "No Failure",
        2: "Overstrain Failure",
        3: "Power Failure",
        4: "Random Failures",
        5: "Tool Wear Failure",
    }

    
    print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
    print("No failure detected.")
Predicted Target (Failure or Not): 1
Predicted Failure Type: Tool Wear Failure
In [ ]:
 
In [ ]:
 

Real-Time Predictive Maintenance System: How It Works¶

In a real-time predictive maintenance system, the model continuously receives input from the sensors on the equipment (like pumps), processes the data, and compares it against the patterns it has learned during training. Here's a step-by-step breakdown of how it works:

1. Real-Time Data Collection:¶

  • Sensors on the equipment (e.g., centrifugal pumps) collect real-time operational data like rotational speed (RPM), temperature, torque, vibration, and other relevant parameters.
  • This data is continuously transmitted to a central system or cloud platform via an IoT network.

2. Input to the Model:¶

  • Preprocessing: The raw sensor data might be preprocessed (for example, by calculating rolling means, standard deviations, or trends, as discussed earlier).
  • This preprocessed data becomes the input to the machine learning model. Every time new data is collected, it serves as fresh input to the model for analysis.

3. Model Comparison & Prediction:¶

  • The trained machine learning model continuously compares the incoming real-time data to the patterns and trends it learned during the training phase.

  • The model checks whether the current values (e.g., RPM, temperature, torque) match those associated with normal operation or indicate signs of impending failure.

    For example:

    • If the RPM deviates from the expected rolling mean, it might signal that the pump is operating inefficiently, which could lead to failure.
    • If the temperature is rising unusually or fluctuating, it could indicate overheating or a malfunction.

4. Failure Prediction:¶

  • Based on the comparison, the model makes a real-time prediction:
    • "No Failure": If the system detects that the equipment is operating normally.
    • "Failure Predicted": If the model detects signs that suggest a potential failure within a specific time window (e.g., 24 hours, 48 hours).
  • The model uses its learned thresholds (like high RPM, high torque, etc.) or patterns to determine whether an alert should be triggered.

5. Alert/Action:¶

  • If the model predicts a failure or detects anomalies, it alerts operators or triggers a maintenance action. The system could issue an alert like:
    • "Warning: Torque is higher than expected, indicating a potential blockage or resistance."
    • "Warning: Temperature is increasing rapidly, indicating overheating."
  • Operators or maintenance teams can then take action to prevent failure, such as adjusting the pump, performing a quick inspection, or scheduling downtime for repairs.

6. Continuous Monitoring:¶

  • This process happens continuously, with the system constantly comparing the latest data to the model's predictions, ensuring that the equipment is always being monitored for potential issues.
  • The model is always "on," updating its predictions as new data comes in.
In [ ]: